The dual of the margin: improved analyses and rates for gradient descent’s implicit bias

Ziwei Ji (University of Illinois)

16-Dec-2020, 17:00-18:00 (5 years ago)

dynamical systemsnumerical analysisprobabilitystatistics theory

Audience: researchers in the topic

Comments: The implicit bias of gradient descent, and specifically its margin maximization properties, have arisen as a promising explanation for the good generalization of deep networks. The purpose of this talk is to demonstrate the effectiveness of a dual problem to smoothed margin maximization. Concretely, this talk will develop this dual, as well as a variety of consequences in linear and nonlinear settings.

In the linear case, this dual perspective firstly will yield fast 1/t rates for margin maximization and implicit bias. This is faster than any prior first-order hard-margin SVM solver, which achieves 1/sqrt{t} at best.

Secondly, the dual analysis also allows a characterization of the implicit bias, even outside the standard setting of exponentially-tailed losses; in this sense, it is gradient descent, and not a particular loss structure which leads to implicit bias.

In the nonlinear case, duality will enable the proof of a gradient alignment property: asymptotically, the parameters and their gradients become colinear. Although abstract, this property in turn implies various existing and new margin maximization results.

Joint work with Matus Telgarsky.


One World Seminar Series on the Mathematics of Machine Learning

Organizers: Philipp Petersen*, Wuyang Chen, Stephan Wojtowytsch, Matthew Thorpe, Issa Karambal, Tiffany Vlaar, Franca Hoffmann
*contact for this listing

Export talk to